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ABSTRACT 

The Law School Admission Test (LSAT) was examined to see if 
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between subgroups. Of such subgrouping can be detected, it is likely that the 
subgroups of items measure different abilities, and the test can be described 
as "multidimensional." The multidimensionality of six forms of the LSAT was 
studied using factor analysis. Two subgroups of items, or "factors," were 
found for each of the six forms. The LSAT thus appears to measure two 
different reasoning abilities, inductive and deductive. The effect of 
dimensionality on equating was also examined by calibrating, with item 
response theory. (IRT) methods, all items on a form to obtain a set of 
estimated item parameters (Set 1). The test was then divided into two 
homogeneous subgroups of items, each having been determined to represent a 
different ability. Items within the subgroups were recalibrated separately to 
obtain item parameter estimates, and these latter estimates were combined 
into a second set, Set 2. It was found that the equating tables based on Set 
1 were highly similar to those based on Set 2 item statistics . Although the 
IRT model theoretically requires one-dimensional tests, it appears to give 
satisfactory results with the LSAT. The equating tables appear to be 
adequate. Two appendixes contain LISREL computer program printouts for the 
factor solutions. (Contains 7 tables, 5 figures, and 26 references.) (SLD) 




Reproductions supplied by EDRS are the best that can be made 
from the original document. 



TM034431 



LSAC RESEARCH REPORT SERIES 



1 M 



r^ 

00 

'o 

^t 




PERMISSION TO REPRODUCE AND 
DISSEMINATE THIS MATERIAL HAS 
BEEN GRANTED BY 



T-V-ASELEGK- 



„„U.S. DE pAR TME NT OF EDUCATION 

Office of Educational Research and Improvement 

EDUCATIONAL RESOURCES INFORMATION 
. CENTER (ERIC) 

^tJ^This document has been reproduced as 
received from the person or organization 
originating it. 

n Minor changes have been made to 
improve reproduction quality. 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC) 



• Points of view or opinions stated in this 
document do not necessarily represent 
official OERI position or policy. 



■ The Effects of Dimensionality on True Score Conversion 
Tables for the Law School Admission Test 

Gregory Camilli 
Ming-mei Wang 
Jaqueline Fesq 



■ Law School Admission Council 
Statistical Report 92-01 
June 1992 




BEST COPY AVAILABI P 





The Law School Admission Council is a nonprofit association of American and 
Canadian law schools. Law School Admission Services (Law Services) adminis- 
ters the Council's programs and provides services to the legal education 
community. 

The Law Services logo and LSAT® are registered marks of Law School Admission 
Services, Inc. 

Copyright © 1992 by Law School Admission Services, Inc. 

All rights reserved. This book may not be reproduced or transmitted, in whole 
or in part, by any means, electronic or mechanical, including photocopying, record- 
ing, or by any information storage and retrieval system, without permission of 
the publisher. For information, contact Publications, Law Services, Box 40, 661 
Penn Street, Newtown, PA 18940-0040, 215.968.1363. 



0 

ERIC 



3 



Executive Summary 



In this study, we examined the Law School Admis- 
sion Test (LSAT) to see if the items on a form could be 
divided into different subgroups where items 
looked statistically similar within the subgroups, 
but statistically different between subgroups. When 
subgrouping can be detected, it is likely that the 
subgroups of items measure different abilities and 
therefore the test can be described as measuring multi- 
ple abilities or as "multidimensional" for short. In 
contrast, the term "unidimensional" is used to de- 
scribe a test for which no subgroups exist — all the 
items look relatively similar from a statistical point of 
view. Such a test measures a single ability. 

The LSAT is equated so that a test score obtained in 
the current year is comparable to scores obtained in 
previous years. Technically, a test model based on 
item response theory (IRT) is used to equate each 
new form of the LSAT to the base scale. This IRT 
model makes the basic assumption that the LSAT is 
unidimensional, as defined above. It is possible that 
a violation of this assumption could lead to unsatis- 
factory equating results; however, it must be 
recognized that 1) most tests are multidimensional 
to some degree, and 2) all practical test models for 
equating are unidimensional. Therefore, the two im- 
portant issues with real tests concern the degree of 
multidimensionality of the test, and whether this 
has a practically significant effect on test equating. 

To explore these issues, we conducted an analysis 
of multidimensionality for six forms of the LSAT 
using factor analysis. This statistical technique is 
commonly used to determine whether statistical 
subgroups of items exist, and which items corre- 
spond to which subgroups. We found two 
subgroups of items or "factors" for each of the six 
forms. The following pattern of results was remark- 
ably consistent — the AR items corresponded to one 
factor, while the RC and LR items corresponded to 
the other. The main conclusion of the factor analysis 
component of this study was that the LSAT appears 
to measure two different reasoning abilities: induc- 
tive and deductive. Both RC and LR items appear 
to measure inductive reasoning, and AR items de- 



ductive reasoning. The item groupings identified 
are thus consistent with the content specifications 
of the LSAT. It is important to add that the analysis 
showed that these two reasoning abilities are highly 
correlated. 

The technique of Dorans and Kingston (1985) was 
used in this study to examine the effect of dimen- 
sionality on equating. In brief, we began by 
calibrating (with IRT methods) all items on a form 
to obtain a set (say Set I) of estimated item parame- 
ters (as, bs, and cs). Next, the test was divided into 
two homogeneous subgroups of items, each having 
been determined to represent a different ability (i.e., 
inductive and deductive reasoning). The items 
within these subgroups were then recalibrated sepa- 
rately to obtain item parameter estimates. These 
latter estimates were then combined into Set II. (All 
estimates were placed on the same scale.) 

If the LSAT were strictly unidimensional, then the 
estimated item parameters in Set I would be very 
close to the corresponding estimates in Set II (only 
small differences would be obtained due largely to 
sampling errors). In other words, the same item sta- 
tistics (as, bs, and cs) would be obtained whether 
AR items were included with RC+LR items or not. 
Consequently, if the item statistics were the same, the 
equating tables based on parameter Sets I and II 
would be practically identical. On the other hand, if 
nonignorable multidimensionality exists, then the re- 
sult of a single calibration of all LSAT items would 
differ noticeably from that of separate calibrations for 
the two subgroups of items. This could lead to differ- 
ent true score equating tables, depending on whether 
Set I or Set II item statistics were used. 

In this study, we found that the equating tables 
based on Set I item statistics were highly similar to 
those based on Set II item statistics. We concluded,, 
as did Dorans and Kingston (1985), that violations 
of unidimensionality may not have a substantial im- 
pact on equating. Although the IRT model 
theoretically requires unidimensional tests, it ap- 
pears to give satisfactory results with the LSAT. 
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Section 1. Overview 

In this document, vve first report the results of pri- 
mary and secondary factor analyses of six forms of 
the Law School Admission Test (LSAT). Secondly, 
we report findings of how multidimensionality — as 
assessed by the factor analyses — may affect test 
equating. Finally, we consider potentially important 
areas for future research. Each of these topics is 
summarized briefly in this section, and is covered 
more fully in an ensuing section. 

Factor Analyses 

The objective of the factor analyses is to assess the di- 
mensionality of LSAT items that are assembled into a 
test form. Though items for any particular form of the 
LSAT are administered in three sections— Analytical 
Reasoning (AR), Reading Comprehension (RC), and 
Logical Reasoning (LR) — two types of ability are in- 
cluded in the content specifications: inductive and 
deductive reasoning. It is highly probable that these 
item types and/or content domains are associated 
with statistical factors, and an analysis of dimensional- 
ity should reveal the extent of this association through 
a determination of 1) the number of underlying 
abilities or factors that adequately account for per- 
formance differences among examinees, and 2) the 
correlations among these abilities. 

Model-based methods estimate factor structures di- 
rectly from the data without resort to intermediate 
steps, i.e., correlations need not be computed. This is 
the strategy of full information (or IRT) factor analysis 
(Bock and Aitkin, 1981) which is available in the pro- 
gram TESTFACT (Wilson, Wood, and Gibbons, 1987). 
However, the full information approach was not used 
in this study because the AR and RC sections of the 
LSxAT have a testlet structure (Wainer and Kiely, 1987), 
that is, a set of 4-8 items may pertain to a single pas- 
sage. Higher correlations among items within 
testlets than among items between testlets resulted in 
extraneous (and tenuous) common factors in prelimi- 
nary TESTFACT runs. To avoid this problem, a 
confirmatory approach was employed in this study to 
model item response dependencies. 

Though a confirmatory approach was taken initially, 
an exploratory approach was employed subsequently 
to examine the common factor structure based upon 
correlations among testlets. It is thus useful to de- 
scribe and distinguish these approaches. In this 
regard, Mulaik (1972) wrote 

In some situations where the researcher ap- 
proaches a new domain, with practically no 
knowledge of what to expect, he will simply col- 
lect a representative sample of the variables to be 
measured in that domain and subject these vari- 
ables to factor analysis. (Not very much in the 



way of a definitive theory about the variables 
should be expected to result from such use of factor 
analysis...) In other situations the researcher will 
have definite hypotheses as to the latent parameters 
present among the variables in a domain and will 
carefully select his variables so as to reveal the pres- 
ence of the latent parameters as clearly as possible. 
In this case the researcher is involved to some ex- 
tent in confirmatory research, (p. 362) 

Mulaik also cited research (see pp. 363-366) indicating 
that in theoretically well developed areas "factor anal- 
ysis does very poorly in recovering the already 
known theory." In this regard, there are two argu- 
ments for using confirmatory methods to analyze the 
dimensionality of LSAT items. First, the physical lay- 
out of the test items — which can be divided into 
analytical reasoning (AR), reading comprehension 
(RC), and logical reasoning (LR) sections — suggests 
there may be three factors, and that a search for 
dimensionality should begin (but not necessarily 
end) in this neighborhood. Second, the LSAT item 
domain is theoretically well developed in terms of 
its content, and this establishes useful prior expecta- 
tions for the item structure. In particular, items 
appearing in the same passage should be more 
highly intercorrelated than cross-correlated with 
items of other passages, and passages should be cor- 
related higher with other passages in the same 
content area than those in different content areas. 

A confirmatory approach can be most effectively 
"targeted" to these two prior expectations. 

In the initial step of this study, correlation matrices 
were obtained/for input to the general confirmatory 
factoring program LISREL 7 (Joreskog and Sorbom, 
1989). However, with dichotomous data this 
presents a problem since we are interested in corre- 
lations of (continuous) latent variables underlying 
the propensity of examinees to answer test ques- 
tions correctly For this purpose, tetrachoric 
correlations were estimated (along with their error 
variances), rather than phi coefficients computed 
from observed responses, which have been shown 
to produce strong artifacts associated with item dif- 
ficulty. Tetrachorics are less prone to this weakness; 
however, they tend to underestimate the latent cor- 
relations in the presence of guessing. This bias can 
be reduced by a simple adjustment described 
below. Although factoring tetrachorics is a piece- 
meal approach to assessing test structure 
(correlations are first estimated assuming both a 
multivariate distribution of latent variables and a 
parametric form for item response functions, and 
these estimates in turn are used to obtain estimates 
of model parameters), modern methods of factor- 
ing, as implemented by LISREL, appear to give 
results comparable to full information techniques 
(Bock, Gibbons, and Muraki, 1988). 
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In this study, tetrachorics were estimated from the 
full samples after an adjustment to the data for 
guessing. Upon obtaining the inter-item tetrachoric 
correlations and their variances, these matrices 
were input to LISREL to perform confirmatory fac- 
tor analysis by the method of diagonally weighted 
least squares. 1 A number of different models were 
hypothesized: 

1. A 3-factor solution was designed in which items 
from Analytical Reasoning (AR), Reading Com- 
prehension (RC), and Logical Reasoning (LR) 
each loaded on separate factors. 

2. An 11-factor solution was designed in which each 
testlet (defined as four or more test items sub- 
sumed under the same passage) of AR and RC 
items loaded on a separate factor, while LR items 
defined a single factor. 

3. A 12-factor solution was designed with one gen- 
eral factor, one specific factor for each testlet. 

4. A 13-factor solution was designed with two general 
factors and one specific factor for each testlet. The 
first general factor was constrained to have free 
loadings for AR items and loadings fixed at zero 
for RC and LR items. The second general factor 
was constrained by the opposite pattern of free and 
fixed loadings for the AR, RC, and LR items. 

The 11 -factor solution was found to fit significantly 
(and practically) better than the three 3-factor solu- 
tion, while the 12- and 13-factor solutions fit 
marginally better than the 11-factor solution. We 
therefore focus on the results of the 11-factor solu- 
tion in this report. 

In the 11 -factor model, correlations between the fac- 
tors (i.e., the 10 testlets and the LR subtest) were 
estimated and then subjected to a second order fac- 
tor analysis in order to discover whether the LSAT 
is unidimensional. Upon examining the results of 
the secondary analyses, it was concluded that a 2- 
factor second-order solution was appropriate. 
(Results from the IRT factor analyses, based on sam- 
ples of 5,000 examinees, corroborated these results 
for one test form, but not another.) The following 
pattern of results was consistently obtained for six 
LSAT test forms: first, the AR testlets loaded highly 
on one factor, while the RC testlets loaded highly 
on the other; second, the LR subtest loaded on both 
factors, but more highly on the factor marked by 
the RC testlets; third, the estimated correlation 
among factors was about 0.7. 



The main conclusion of the factor analysis compo- 
nent of this study was that the LSAT appears to 
measure two different reasoning abilities: inductive 
and deductive. One possible interpretation is that 
one factor (RC+LR) relates to the ability to under- 
stand relevancy and to make inferences, and the 
other (AR) relates to the ability to analyze a prob- 
lem once it is discovered and defined. Another 
interpretation is that both RC and LR items appear 
to measure inductive reasoning, while AR items ap- 
pear to measure deductive reasoning. The item 
groupings identified in the factor analysis are thus 
partially congruent with the physical structure of 
the test items, but more importantly, are wholly con- 
sistent with the content specifications of the LSAT. 

It is important to note that the analysis showed 
these two reasoning abilities are highly correlated. 

A correlation of 0.7 between the two second-order 
factors suggests that a single dominant second- 
order factor could account for 85% of the common 
factor variances. 

Equating 

The technique of Dorans and Kingston (1985) was 
used in this study to examine the effect of dimen- 
sionality on equating. In brief, this technique begins 
by calibrating (with IRT methods) an intact test to 
obtain a set (say Set I) of estimated item parameters 
(as, frs, and cs). Next, the test is divided into two ho- 
mogeneous subgroups of items, each having been 
determined to represent a different content domain 
(logically and statistically). The items within these 
subgroups are recalibrated separately to obtain sets 
of item parameter estimates (call these Sets Ha and 
lib). Finally, all sets of parameter estimates are 
placed on a common metric, and Sets Ha and lib are 
combined (call this Set II). By convention, this step 
is referred to as the "equating step" and the result- 
ing ICC estimates are referred to as "equated ICCs." 

If the test is strictly unidimensional, the estimated 
ICCs in Set I would be very close to the correspond- 
ing ICCs in Set II (small differences would be 
obtained largely due to sampling errors), and the true 
score equating tables based on parameter Sets I and II 
would be practically identical. On the other hand, if 
nonignorable multidimensionality exists, then the re- 
sult of a single calibration of heterogeneous items (Set 
I) would differ noticeably from that of separate cali- 
brations for the two groups of homogeneous items 
(Set II). In the presence of multidimensionality, the 
ICCs estimated in Set I would represent an ability that 
is qualitatively different from the two (correlated) la- 
tent abilities representing the ICCs in Sets Ha and lib, 
respectively. Thus, the two sets of "equated" ICC esti- 
mates would likely differ, leading to different true 
score equating tables. 



Based on the resuits or the factor analyses in the 
first phase of this study, the item parameters and 
abilities for a given form were estimated for 

1. Heterogeneous (HT) sets of items. A heteroge- 
neous set was represented by all AR, RC, and LR 
items on a given form. The term "heteroge- 
neous" refers to the fact that we are combining 
items measuring different abilities. We also refer 
to the heterogeneous set as AR+RC+LR below. 

2. Homogeneous (HM) subsets of items. There are two 
homogeneous subsets: the AR items, on one hand, 
and the RC and LR items on the other. We refer to 
these subsets as AR and RC+LR below, respectively. 

The items in these sets were then calibrated with 
BILOG III (Mislevy and Bock, 1990) using default 
options, and were individually placed on the same 
scale using the characteristic curve method for scale 
transformation (commonly known as the TBSE pro- 
cedure) described in Stocking and Lord (1983). In 
particular, the a and b estimates in Set I were linked 
to their operational form counterparts which had 
been previously placed on the LSAT base scale. This 
procedure was then repeated for Sets Ila and lib, 
separately, and the results were then combined into 
Set II. In this wav, all estimates were thus placed on 
the LSAT base scale, resulting in the comparability 
of Sets I and II. 



The estimated as, bs , and cs were highly similar for 
four LSAT forms. A correlational analysis showed 
that the correlations of bs from Sets I and II to be ap- 
proximately 0.99 for the forms, while the 
correlations among as ranged from 0.71 to 0.94 
across forms. The correlations of b estimates within 
Sets AR and RC+LR to their AR+RC+LR counter- 
parts were also high (r = 0.99). The correlations 
between as within Set AR ranged from 0.62 to 0.92, 
and within Set RC+LR from 0.97 to 0.98. For the 
AR+RC+LR Set across the four forms, the root mean 
square difference for as ranged from 0.07 to 0.14; for bs 
from 0.09 to 0.18; and for cs from 0.04 to 0.05. 

The equivalence of the item parameter estimates was 
also examined within the context of the equating 
tables they produced. A true score conversion table 
was constructed for both the HT and HM calibrations 
separately using the June 1989 administration (0LSS2) 
as the base form. The HT and HM converted true 
scores were examined, and found to be highly similar. 
Throughout the true score range on form 0LSS2, the 
HT and HM calibrations resulted in true score conver- 
sion differences that ranged from -0.6 to +0.3 point. 
Thus, the effect of multidimensionality on true score 
conversions was found to be less than 1 point (or one 
question on form 0LSS2) for all four forms examined. 
We conclude, as did Dorans and Kingston (1985), that 
violations of unidimensionality may not have a sub- 
stantial impact on equating. However, the effects on 
certain individuals may not be negligible. In future re- 
search we will attempt to identify these examinees. 




Section 2. Description of the Data 

The data come from six different administrations of 
the LSAT during the years 1989 and 1990. Each test 
is divided into three sections — Analytical Reason- 

Table 1 



Date 


Form 


Number of 
Examinees 


AR 


Total Items Scored 
RC 


LR 


Total 

Items 


June 1989 


0LSS2 


22088 


29 


34 


33 


96 


Sept. 1989 


0LSS1 


43317 


29 


32 


33 


94 


Dec. 1989 


0LSS3 


43796 


29 


34 


31 


94 


Feb. 1990 


0LSS5 


29240 


29 


34 


35 


98 


June 1990 


1LSS7 


25597 


29 


34 


34 


97 


Oct. 1990 


0LSS6 


49644 


29 


34 


35 


98 



ing (AR), Reading Comprehension (RC), and Logi- 
cal Reasoning (LR). The dates of administration, the 
total number of examinees at each administration, 
and the total number of scored items in each section 
of the test are given below in Table 1. 



The Analytical Reasoning section and the Reading 
Comprehension section both consist of five groups 
of "testlets" or passages. These are a series of four 
to eight items which all refer to the same reading 
passage or problem. The scoring for each item re- 
sponse is coded as 0 - incorrect, 1 - correct, 2 - omit, 

3 - not reached, and 9 - not scored. Table 2 gives the 
average number correct, average number of omits, 
and average number of not reached items for each 
section of the test for each administration. It is evident 
from Table 2 that the average number of "omit" and 
"not reached" item responses across all three content 
areas is usually in the range of 1.0 to 1.5. 



Before the correlations between items were calcu- 
lated the scores for each person were adjusted for 
the omit and the not reached responses. A guessing 
factor of 0.20 was chosen because there are five pos- 
sible choices for each item. If a randomly generated 
number between 0 and 1 was less than or equal to 
0.20, the score of 2 (omit) or 3 (not reached) was 
recoded as a correct response. Otherwise, it was 
recoded as incorrect. This procedure is discussed 
more fully in the next section. (Statistics in Table 2 
were calculated prior to random substitution.) 

Items that were not scored on final forms of the 
LSAT were omitted from the analyses. 
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Table 2 



Descriptive statistics for omits, not reached, and number correct for six forms of the LSAT. 





June 1989 




September 1989 


December 1989 




(N = 22088) 




(N = 43317) 




(N = 43796) 


Variable 


Mean Std Dev 


Mean Std Dev 


Mean 


Std Dev 


Analytical Reasoning 














Omits 


.23 


.90 


.24 


1.01 


.23 


.95 


Not Reached 


.40 


1.75 


.22 


1.45 


.26 


1.76 


Number Correct 


18.14 


5.52 


17.98 


5.65 


18.98 


5.59 


Reading Comprehension 














Omits 


.14 


.77 


.09 


.52 


.13 


.68 


Not Reached 


.47 


2.91 


.29 


2.28 


.30 


1.77 


Number Correct 


19.91 


6.05 


19.56 


4.98 


19.81 


6.06 


Logical Reasoning 














Omits 


.09 


.50 


.08 


.49 


.12 


.90 


Not Reached 


.30 


2.64 


.24 


2.26 


.14 


.92 


Number Correct 


20.28 


5.57 


21.83 


5.15 


19.29 


5.06 




February 1990 


June 1990 




October 1990 




(N = 29240) 




(N = 25597) 




(N = 49644) 


Variable 


Mean Std Dev 


Mean Std Dev 


Mean 


Std Dev 


Analytical Reasoning 














Omits 


.29 


1.20 


.20 ' 


.88 


.19 


.84 


Not Reached 


.36 


2.21 


.26 


1.95 


.14 


.90. 


Number Correct 


16.49 


5.80 


18.79 


5.89 


19.81 


5.70 


Reading Comprehension 














Omits 


.16 


.79 


.11 


.64 


.08 


.51 


Not Reached 


.36 


1.77 


.33 


2.44 


.24 


2.15 


Number Correct 


19.93 


6.82 


21.50 


6.86 


23.10 


5.82 


Logical Reasoning 














Omits 


.13 


.62 


.13 


.65 


.09 


.51 


Not Reached 


.50 


2.86 


.26 


1.30 


.25 


2.22 


Number Correct 


20.10 


6.52 


21.93 


5.51 


23.40 


6.16 
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Section 3. Pilot TESTFACT Runs 

As mentioned above, the LSAT has a testlet struc- 
ture for the AR and RC subtests. Thus, there is good 
reason to believe a priori that statistical dependen- 
cies exist among items within testlets. For example, 
Thissen, Steinberg, and Mooney (1989) analyzed a 
4-passage, 22-item test of reading comprehension 
in which the items appeared in clusters of 7, 4, 3, 
and 8. An item factor analysis with TESTFACT 
(n = 3,866) supported the existence of at least four 
factors, one of which was clearly a passage or test- 
let factor. In addition, there were two subpassage 
factors which were composed by the last few items 
in the first and fourth testlets. Thissen, Steinberg, 
and Mooney concluded 

Because there was one unambiguous passage 
factor, it appears that there are sometimes pas- 
sage factors, which poses as much of a problem 
for the assumption of unidimensionality among 
reading comprehension items as if there were al- 
ways passage factors. The subpassage factors also 
indicate that specific passage content influences 
the responses to some of the items following each 
passage, (pp. 249-250) 

Two pilot runs with TESTFACT were done with the 
June (0LSS2) and September (0LSS1) 1989 LSAT 
data sets to explore the possibility of passage struc- 
tures. For the June 1989 (sample n = 5,000) data, a 
2-factor solution was found, indicated by a 
X 2 = 12.24 with 92 degrees of freedom (p ~ 1.0) for a 
potential third factor. The pattern of item loadings 
on the two factors (rotated to an oblique solution) 
was unambiguous: all AR items loaded on one fac- 
tor, and all RC and LR items loaded on the other. 
However, one problem was noted with the solution. 
When residual correlations were inspected, it was 
clearly evident that large residuals were present 



within each cluster or testlet. This suggested the ex- 
istence of passage clusters that were not detected 
by TESTFACT. 

For the September 1989 data a 3-factor solution was 
found, indicated by a X 2 = 775.76 with 94 degrees of 
freedom (p < 0.001) for a third factor. (A 4- factor solu- 
tion could not be run with the current PC version of 
TESTFACT.) The pattern of item loadings on the three 
factors (rotated to an oblique solution) was as follows: 

1. Factor I was defined by a single RC testlet (items 15- 
21) concerning supply-side economics, other RC 
and LR items loaded moderately on this factor. 

2. Factor II contained relatively high loadings for AR 
items with the exception of the first AR subtest. 
The RC items defining Factor I had moderately 
negative loadings on this factor. 

3. Factor III was defined by a single AR testlet (items 
1-6) concerning the composition of committees 
serving a university's board of trustees. Other 
AR items, as well as LR items, had low loadings 
on this factor. However, the RC testlet marking 
Factor I had loadings in the 0.12-0.20 range. 

Confirmatory LISREL solutions were compared 
with the TESTFACT solution for the June and Sep- 
tember 1989 data sets. The confirmatory solutions 
were more interpretable and stable across the 
forms. The LSAT has a passage structure that ob- 
structs the identification of common factors by full 
information techniques as currently implemented. 
Consequently, it seemed preferable to use a confir- 
matory approach that takes into account prior 
information regarding the item structure of the test. 
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Section 4. Recent Work in Test Dimensionality 

The assessment of test dimensionality is a topic of 
much interest in the current psychometric litera- 
ture. McDonald (1981) suggested that under the 
assumption of unidimensionality off-diagonal en- 
tries in the item variance-covariance matrix at 
different levels of examinee ability should be close 
to zero. Rosenbaum (1984) proved mathematically 
that multidimensionality implies that the inter-item 
covariance for any two items must be greater than 
zero for groups of examinees with identical scores 
on the remaining items. Stout (1987, 1990) broad- 
ened these ideas by examining the asymptotic 
behavior of inter-item covariances as the size of an 
item pool increased. (But note that an "item pool" is 
not synonymous with a "test form.") He coined the 
phrase essentially unidimensional to refer to a test 
with one major factor and one or more minor fac- 
tors. Both Rosenbaum and Stout suggested 
statistical procedures for dimensionality testing. 

Reckase (1979) concretely demonstrated that 2- and 
3-parameter IRT models can consistently retrieve the 
major dimension of an essentially unidimensional 
test. However, he also showed that for a test with 
equally potent dimensions, IRT procedures provided 
inconsistent estimates of ability. Most useful tests are 
not unidimensional, yet may be essentially so. Hence, 
unidimensional test models and procedures may give 
satisfactory results. Hambleton (1989, p. 150) wrote 
that '"What is required for the assumption of uni- 
dimensionality to be met to a satisfactory extent by a 
set of test data is a dominant component or factor." 

Dominance is presumably measured by the "fit" of 
a unifactor model relative to a multifactor model; 
however, this analysis may oversimplify the prob- 
lem of assessing dimensionality. Other aspects of 
dimensionality must be considered because models 
that are logically quite different can often be con- 
structed to fit data structures equally well. Thus, 
indices of statistical fit should not be the sole arbi- 
ters of dimensionality. Dimensionality is not a 
property of a test per se — it is context dependent. 
Two short examples may serve to illustrate prob- 
lems with a purely "objective" approach to 
dimensionality assessment. Zwick (1987) noted that 
reliance on full information fit statistics tended to 
lead to overfactoring, and therefore concluded that 
"the size of factors and the patterns of loadings 
should also be considered in determining the num- 



ber of factors." For Stout's (1987) test of dimension- 
ality, a short homogeneous set of assessment items 
must be chosen, either by expert opinion or the stra- 
tegic use of factor analysis (Nandakumar, 1991). 
Note that both of these procedures require some 
substantive knowledge on the part of the researcher 
in interpreting patterns of loadings or selecting a 
set of homogeneous items. 

We think that "test dimensionality" is a type of vali- 
dation argument concerning a test related to 
content validation. It is well- recognized that con- 
tent validity is not a property of test items 
themselves, but rather a function of the interaction 
of examinees with test items and a function of the 
test's use. In this regard Messick (1989, p. 41) wrote 

Strictly speaking, even from a content viewpoint, 
it would be more apropos to conceptualize con- 
tent validity as residing not in the test, but in the 
judgment of experts about domain relevance and 
representativeness. 

Because the "nature and dimensionality of the inter- 
item structure should reflect the nature and 
dimensionality of the construct domain," (Messick, 
1989, p. 44) it is likely that judgments regarding the 
content affect those regarding the dimensionality of 
a test. Therefore, test content is not subordinate to 
test dimensionality; rather, an argument must be 
made for a particular conclusion regarding dimen- 
sionality, and this argument should incorporate 
both judgments about test content and evidence 
from statistical analyses. In short, arguments con- 
cerning dimensionality need to be validated in the 
context of a test's use. 

In the analyses below, we incorporate judgments 
concerning the content domain of the LSAT into a 
statistical analysis of item structure. We conceive 
of this approach as an analysis of functional 
dimensionality rather than one of statistical 
dimensionality. This approach is motivated by the 
need to manage the item dimensionality as 
opposed to the unrealistic goal of creating a uni- 
dimensional test. The practical consequences — to 
both the developer and examinee — are assessed in 
this study of treating a multifactor test as uni- 
dimensional for the purpose of equating. 




Section 5. Obtaining Inter-Item 
Correlation Matrices 

The factoring methods used in the next section are 
based on inter-item tetrachoric correlations. 

Though tetrachoric correlations may yield spurious 
factors (Carroll, 1983; Hulin, Drasgow, and Parsons, 
1983), there is one approach that seems to yield ac- 
ceptable results. Christoffersson (1975) and Muthen 
(1978) developed a generalized least squares (GLS) 
approach based on the work of Browne (1984). 
Bock, Gibbons, and Muraki (1988) compared a full 
information solution with the corresponding GLS 
solutions. The results were highly similar leading to 
the conclusion that the essential correctness of both 
methods was supported. However, the GLS solu- 
tion requires a large amount of computer memory 
and is practically implemented for only about 25 
items. A compromise approach, diagonally 
weighted least squares, has been developed and is 
economically implemented. Below, we provide a ra- 
tionale for the use of tetrachoric correlations. The 
method used to factor analyze matrices of these co- 
efficients will be described in Section 6. 

In modem measurement theory, item responses de- 
pend linearly on the true ability (or abilities) of the 



examinee, an item "threshold," and an error of mea- 
surement. When the propensity of an examinee to 
get an item correct exceeds this threshold, a correct 
response or "1" is observed. If it falls below this 
threshold, an incorrect response or " 0 " is observed. 
An examinee's true ability and propensity are unob- 
served variables, only the dichotomous response is 
observed. This suggests two general strategies for 
estimating inter-item correlations: to use observed 
responses in the standard correlation formula; or to 
assume a model by which discrete responses are 
generated from continuous responses, and to esti- 
mate a correlation within the framework of this 
model. For example, if it is assumed that the (unob- 
served) propensities to answer any pair of items 
correctly are bivariate normal, then the tetrachoric 
correlation rt is defined as the correlation parameter 
in the bivariate normal distribution function. 

The latter approach is diagramed in Figure la, in 
which a bivariate normal distribution of propensi- 
ties is dichotomized at points a and (3. This process 
leads to the observed frequencies in the 2x2 contin- 
gency table in Figure lb for a test population with 
N examinees. 



Figure la 

Bivariate normal model for generating observed dichotomous item responses. 
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Figure 1b 

Crosstabu lated frequencies of observed correct and incorrect responses 
from the bivariate normal model. 




wrong right 



A+B+C+D=N 



A = NP 
A 



B = NP 
B 



C = NP 
C 



D = NP 
D 



item i 



To estimate rt, we work backward from the ob- 
served data to the parameters of the bivariate 
function that "generates" the data. The method of 
maximum likelihood (ML) estimation selects rt as 
the value that makes the observed (tabled) data 
"most likely." For example, it would be extremely 
improbable that inter-item correlations of 0.0 or 1.0 
would generate the responses that are observed 
from most real test items because it is common to 
see some examinees miss one item of any pair while 
answering the other correctly. It is more likely that 
such observed item responses conform to some in- 
termediate point in the interval (0.0, 1.0). In this 
study, the results by Tallis (1962) were used to write 
an ML algorithm for estimating rt for all pairs of 
items. The conditional approach was chosen in 
which the marginal means of the bivariate distribu- 
tion of propensities (a and P) were fixed. 
Simultaneous ML estimation of all parameters was 
attempted, but severe convergence problems were 
encountered in a number of instances. Tallis (1962) 
noted other drawbacks of simultaneous estimation. 



Entries in the 2x2 table were adjusted for guessing 
prior to ML estimation. Briefly, this entailed shifting 
a proportion of responses from correct to incorrect 
to compensate for guessing. Carroll (1945, 1983) 
and Bock, Gibbons, and Muraki (1988) presented a 
more complete treatment of this adjustment. A com- 
mon guessing parameter of 0.1 was chosen for all 
items. This is the approximate value of the median 
c estimate on LOGIST calibrations of the LSAT (Ali- 
son Sneickus, personal communication). The error 
variances of the rts were also estimated with the ML 
algorithm, and were saved for further use (as de- 
scribed below). Because some correlations are 
estimated much more precisely than others as indi- 
cated by smaller variances, it is desirable to give 
such estimates more weight in ensuing analyses. 

All correlation matrices were obtained from the full 
population of test takers, and not small subsamples 
(sample sizes ranged from 25,000 to 45,000). 
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In a number of instances examinees failed to at- 
tempt or did not reach an item. Such item responses 
are scored as incorrect which results in mis-estima- 
tion of the n (Carroll, 1945, 1983). To compensate 
for this source of error, we substituted plausible re- 
sponses for omitted and not-reached items, and 
then employed Carroll's correction for guessing uni- 
formly for all examinees. Specifically, we assumed 
that if these examinees followed the instructions 
given in the LSAT information booklet, they would 
quickly make guesses for omitted or unreached 
items at the end of the test period. This guessing 



process would result in a probability of 0.2 (with a 
five option item) of answering correctly, while a 
"normal" guessing process would still result in a 
probability of 0.10. Thus, when omitted or un- 
reached responses were encountered, item 
responses were randomly generated according to 
the following model. A uniform random deviate U 
was drawn on the interval [0,1]. If U < 0.2, then a 
"1" was recorded; otherwise, a "0" was recorded. 
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Section 6. Design of Confirmatory 
Factor Solutions 

Item factor analysis typically is encountered in two 
forms, exploratory and confirmatory. The confirma- 
tory use is characterized by imposing to (a limited 
degree) an hypothesized structure on the item data 
(in the form of the inter-item correlation matrix) 
and measuring the "fit" or how well the structure 
accounts for the data. Since the "true" structure is 
rarely, if ever known, the basic strategy of the confir- 
matory approach is to compare different or rival 
hypotheses about the structure. If one structure 
"fits" better than another, in terms of both practical 
and statistical significance, than it is preferred. 
Clearly, there is an intended structure to the LSAT 
which is based both on content specification and 
item type. The LSAT is divided into three main sec- 
tions labeled Analytical Reasoning (AR), Reading 
Comprehension (RC), and Logical Reasoning (LR). 
Items in these sections may be self-contained or 
may appear in clusters. Item clusters, or testlets, are 
items that refer to the same extended stem or pas- 
sage. All AR and RC items are clustered in groups 
of 4-8, while LR items most frequently stand alone 
(or infrequently in couplets). 

Two preliminary confirmatory solutions were de- 
signed: first, a 3-factor solution in which items from 
Analytical Reasoning (AR), Reading Comprehen- 
sion (RC), and Logical Reasoning (LR) each loaded 
on separate factors; and second, an 11-factor solu- 
tion in which each testlet (defined as four or more 
test items subsumed under the same passage) of AR 
and RC items loaded on separate factors, while LR 
items defined a single factor. Within both designs, 
correlations among factors were estimated as free pa- 
rameters. Upon estimation, the testlet (or factors 
representing testlets) correlation matrix was ob- 
tained for both designs. The procedures for the 3- 
and 11-factor solutions were thus confirmatory ex- 
cept for the fact that a correlational structure among 
the factors was not directly imposed. 



This two step process was opted instead of a com- 
pletely exploratory solution (in which no structure is 
imposed except for the number of factors) for two rea- 
sons. First, exploratory approaches rarely identify 
correct models in the presence of highly correlated 
variables. Such models efficiently provide the "best 
fit" to the data, but in a predictive rather than substan- 
tive sense. In fact a number of models may fit the data 
reasonably well so that other criteria such as interpret- 
ab llity, parsimony, and cross-validation should also be 
considered. Second, a confirmatory approach based 
on the content specification of the LSAT more closely 
relates the statistical structure of the items to the item 
development process. For example, examining how a 
testlet functions within a multidimensional test is 
more relevant to item developers than examining a 
single item from a testlet. This is because items from 
clusters are not developed independently, nor are 
they independent in a statistical sense. Factoring 
methods that recognize this non-independence are 
likely to have greater practical significance as well as 
providing a good fit to the data. 

For estimation, a diagonally weighted least squares 
(DWLS) algorithm was used (joreskog and Sorbom, 
1989). This method iteratively approximates model 
coefficients (i.e., factor loadings) by obtaining an ini- 
tial solution and then using this as a stepping-stone 
to locate a "better" solution. Here a 'better" 
solution means one in which the tetrachoric correla- 
tion matrix can be reproduced more accurately from 
estimated factor coefficients. (Because the factor coeffi- 
cients are considered to "generate" the correlational 
structure.) During estimation, the sum of squared dis- 
crepancies (say S) is successively reduced between 
inter-item correlations (say r g h) predicted by a particu- 
lar factor structure and the observed tetrachorics (say 
r g h) . Technically, a new solution is chosen at step i+1. 
that shows better fit than the solution at step i, thus 
the strategy is to minimize S. Iterations are ended 
when decreases in S become negligible. 

The method of DWLS defines S as 



For the 11-factor solution, we expected that pas- 
sages should be correlated higher with other 
passages in the same content area than those in dif- 
ferent content areas. To reveal this potential 
phenomenon, we designed a secondary confirma- 
tory analysis for the 11-factor correlation matrix. In 
this design, we assumed two factors. The first was 
identified by fixing the loadings of one AR testlet 
to "1" on the first factor and "0" on the second. 
Likewise, the loadings for one RC testlet was fixed 
as "0" on the first factor and "1" on the second. 
Aside from these constraints, all other coefficients 
were treated as free parameters including the corre- 
lation of the two factors. Finally, the standardized 
solution for this design was obtained with LlSREL. 



s = I 




where w g h is the conditional asymptotic variance of 
the ML estimate r g h. In this manner, estimates of r g h 
with more sampling error contribute less to the so- 
lution. Both the rts and their weights were obtained 
with the full samples of test takers (see Table 1 for 
sample sizes) and then input to the DWLS algo- 
rithm implemented in LISREL 7. This method is 
more efficient than an unweighted solution 
(Joreskog and Sorbom, 1989). (The method of fully, 
or generally weighted least squares, which does 
yield asymptotically efficient estimates, is not prac- 
tically implemented with more than 20-30 items.) 
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Section 7. Results of Factor Analyses 

First-order Analyses 

Six forms of the LSAT were factored using confirma- 
tory methods, and model coefficients were obtained. 
These included factor loadings and inter- factor corre- 
lations. The solutions were highly similar for each 
form; therefore, the LISREL analyses are given for the 
September 1989 administration only (0LSS1). For the 
3- factor solution of this data set (Appendix 1), the ad- 
justed goodness of fit index (AGI) and root mean 
square residual (RMSR) were 0.963 and 0.040, respec- 
tively. For the 11-factor solution (Appendix 2), the 
AGI and RMSR were 0.986 and 0.027. For the purpose 
of interpretation, AGI is an index that varies between 
0 and 1, with 1 being "perfect" fit, while the RMSR 
index should be close to 0 with models that fit the 
data closely. The X 2 values for these models were not 
meaningful given the large sample sizes — all X 2 val- 
ues obtained were on the order of 10 6 . The AGI and 
RMSR were used to compare the fits of the two mod- 
els. For example, the 11-factor solution had an AGI 
about 2% higher (0.986 v 0.963), and an RMSR about 
50% lower (0.027 v 0.040) than the 3-factor model. 

This finding was consistent for each of the six forms 
examined. Thus, we concluded that the 11-factor solu- 
tion gave a meaningfully better fit. 

For the 3-factor solution it was observed that the 
correlations across the six LSAT forms between the 
RC and LR factors averaged 0.9, while the AR/RC 
and AR/LR correlations averaged 0.66 and 0.75, re- 
spectively. Thus, it appeared that a 2-factor solution 
might be appropriate with RC and LR defining one 
factor, due to their high intercorrelation, and AR de- 
fining another. Two additional confirmatory models 
were examined for determining the number of fac- 
tors. For 2 LSAT administrations (0LSS2 and 0LSS3) 
a 12-factor solution was obtained with one general 
factor and 11 unique factors, and a 13-factor solu- 
tion was obtained with two general factors and 11 
unique factors. Both of these models fit marginally 
better than the 11-factor solution; however, the 13- 
factor model fit better than the 12-factor model (in 
terms of the AGI and RMSR) suggesting the pres- 
ence of two factors. 

The 11-factor design, or testlet design, probably ac- 
quired its relative power by modeling the 
interdependence of items within a testlet. Such items 



were not locally independent in the 3-factor solu- 
tion due to the same characteristic of the 
overarching passage: inter-item correlations within 
a passage were significantly underestimated. For ex 
ample, item responses pertaining to a passage 
dealing with supply-side economics showed very 
high residual correlations for the 3-factor model. 
Nonsubstantive characteristics of some passages 
could also have lead to local dependence. In the 
testlet design, each testlet defined an ability that com- 
bined both common (say reading comprehension) 
and unique (say knowledge of economics) elements. 
The unique elements may have affected item re- 
sponses even though they were not strictly required 
for answering correctly; for example, substantive fa- 
miliarity could have increased processing efficiency. 

Second-order Analyses for the 11-factor Model 

As mentioned above, we expected that passages 
should be correlated higher with other passages in 
the same content area than those in different con- 
tent areas. To reveal this potential structure without 
imposing it, we designed a secondary analysis (see 
Section 6) with only mild restrictions on the factor 
structure. The standardized solutions for this de- 
sign were obtained for 3 LSAT administrations: 

June (0LSS2) and December (0LSS3) 1989, and Octo- 
ber (0LSS6) 1990. These solutions and the 11-factor 
correlation matrices appear in Tables 3-5. 

The findings were highly similar across these three 
analyses. A highly consistent structure emerged in 
which Factor 1 was marked by the AR testlets and 
to a slight degree by LR, and Factor 2 was marked by 
the RC testlets and to a lesser degree by the LR sub- 
scale. These factors, which were labeled RC/LR and 
AR, were moderately correlated (0.56 - 0.74). Thus, the 
two major abilities influencing test performance are 
not statistically independent. The RC and LR abilities 
may be distinct, but correlated to such a high degree 
that they were practically indistinguishable. How- 
ever, because both RC and LR items are also intended 
to measure inductive reasoning, we conclude that one 
statistical factor subsumes both item types. Finally, it 
is noted that the secondary factor results were corrob- 
orated by purely exploratory factor analyses of the 
inter-factor correlation matrices. 
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Factor inter-correlation matrix and secondary factor analysis results for the 
June 1989 administration (0LSS2). 



Correlation Matrix 





Var 1 


Var 2 


Var 3 


Var 4 


Var 5 


Var 6 


Var 7 


Var 8 


Var 9 


Var 10 


Var 2 


0.477 




















Var 3 


0.473 


0.577 


















Var 4 


0.389 


0.503 


0.634 
















Var 5 


0.442 


0.472 


0.588 


0.653 














Var 6 


0.453 


0.446 


0.490 


0.421 


0.479 












Var 7 


0.420 


0.473 


0.490 


0.432 


0.475 


0.818 










Var 8 


0.328 


0.310 


0.336 


0.319 


0.338 


0.544 


0.545 








Var 9 


0.385 


0.404 


0.441 


0.433 


0.449 


0.751 


0.736 


0.519 






Var 10 


0.437 


0.419 


0.487 


0.503 


0.597 • 


0.768 


0.731 


0.483 


0.764 




Var 11 


0.558 


0.559 


0.611 


0.604 


0.637 


0.838 


0.838 


0.575 


0.789 


0.836 



Standardized Factor Solution 





Factor 1 


Factor 2 


Var 1 


0.432 


0.206 


Var 2 


0.594 


0.090 


Var 3 


0.777 


0.000 


Var 4 


0.889 


-0.113 


Var 5 


0.757 


0.035 


Var 6 


-0.083 


0.959 


Var 7 


-0.052 


0.925 


Var 8 


0.000 


0.603 


Var 9 


-0.056 


0.881 


Var 10 


0.116 


0.778 


Var 11 


0.249 


0.771 



Factor Correlation r = .713 

Root Mean Square Residual .025 





Table 4 



Factor inter-correlation matrix and secondary factor analysis results for the 
September 1989 administration (OLSSl). 



Correlation Matrix 



Var 2 


Var 1 
0.522 


Var 2 


Var 3 


Var 3 


0.521 


0.683 




Var 4 


0.493 


0.699 


0.672 


Var 5 


0.414 


0.593 


0.625 


Var 6 


0.510 


0.574 


0.535 


Var 7 


0.393 


0.483 


0.404 


Var 8 


0.383 


0.483 


0.400 


Var 9 


0.480 


0.590 


0.581 


Var 10 


0.372 


0.489 


0.484 


Var 11 


0.556 


0.653 


0.599 



Standardized Factor Solution 
Factor 1 Factor 2 


Var 1 


0.500 


0.187 


Var 2 


0.741 


0.123 


Var 3 


0.834 


0.000 


Var 4 


0.720 


0.170 


Var 5 


0.746 


0.013 


Var 6 


0.191 


0.789 


Var 7 


-0.018 


0.883 


Var 8 


0.000 


0.878 


Var 9 


0.291 


0.700 


Var 10 


0.234 


0.627 


Var 11 


0.307 


0.752 



Factor Correlation 



Var 4 Var 5 Var 6 Var 7 



0.641 








0.599 


0.477 






0.496 


0.355 


0.790 




0.537 


0.373 


0.779 


0.798 


0.611 


0.531 


0.804 


0.729 


0.540 


0.502 


0.688 


0.659 


0.680 


0.551 


0.873 


0.791 



r = .558 



Var 8 



0.739 

0.642 

0.812 



Var 9 



0.789 

0.864 



Root Mean Square Residual .021 
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Var 10 



0.741 




Table 5 

Factor inter-correlation matrix and secondary factor analysis results for the 
October 1990 administration (0LSS6). 



Correlation Matrix 





Var 1 


Var 2 


Var 3 


Var 4 


Var 5 


Var 6 


Var 2 


0.592 












Var 3 


0.633 


0.577 










Var 4 


0.645 


0.645 


0.665 








Var 5 


0.550 


0.563 


0.533 


0.701 






Var 6 


0.538 


0.494 


0.530 


0.551 


0.485 




Var 7 


0.555 


0.511 


0.567 


0.556 


0.489 


0.905 


Var 8 


0.518 


0.449 


0.511 


0.504 


0.426 


0.827 


Var 9 


0.494 


0.452 


0.526 


0.502 


0.436 


0.781 


Var 10 


0.464 


0.418 


0.483 


0.475 


0.440 


0.700 


Var 11 


0.645 


0.573 


0.665 


0.658 


0.578 


0.835 



Standardized Factor Solution 
Factor 1 Factor 2 



Var 1 


0.772 


-0.008 


Var 2 


0.811 


-0.098 


Var 3 


0.776 


0.000 


Var 4 


1.008 


-0.192 


Var 5 


0.872 


-0.159 


Var 6 


0.048 


0.884 


Var 7 


0.044 


0.927 


Var 8 


0.000 


0.896 


Var 9 


0.060 


0.813 


Var 10 


0.120 


0.690 


Var 11 


0.324 


0.686 



Var 7 



0.853 

0.814 

0.718 

0.896 



Var 8 



0.769 

0.711 

0.836 



Var 9 



0.709 

0.818 



Var 10 



0.781 



Factor Correlation 



r = .745 



Root Mean Square Residual .016 
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Section 8. Results of Equating Analyses 

For 4 LSAT forms (0LSS2, 0LSS3, 0LSS5, and 
0LSS6), items were calibrated with BILOG using 
standard options. The estimated as, bs, and cs were 
obtained for heterogeneous (HT) and homogeneous 
(HM) groups of items, and placed on the same scale 
with the TBSE procedure. The item parameter esti- 
mates for the HM and HT calibrations were then 
compared by examining correlations and root mean 
square differences (RMSD). The results were similar 
across forms, and are typified by those for 0LSS2 
and 0LSS3 which are given in Tables 6 and 7. 



Overall, several observations can be made. First, the 
correlations among the HM and HT estimated bs, 
whether in AR or RC+LR sets of items, were quite 
high with a modal value of about 0.99. Second, the 
correlations among the a and c parameter estimates 
were high for RC+LR sets, but lower for AR sets. 
Third, the differences between the HM and HT pa- 
rameter estimates were relatively low for the RC+LR 
subset. For example, the 0LSS2 root mean square dif- 
ferences (RMSD) for RC+LR were 0.04, 0.08. and 0.02 
for the a, b, and c differences, respectively, while these 
differences for the AR set were 0.10, 0.30, and 0.07. In 
general, the RMSDs were about twice as high for the 
AR sets as the RC+LR sets. 



Tables 



Correlations and root mean square differences for HM and HT 
item parameters from form 0LSS2. 



All Items 





A2 


B2 


C2 


Correlations 








A1 


. 92 ** 


.04 


.09 


B1 


.10 


.99** 


.18 


Cl 


.27** 


.10 


.86** 


RMSD 


.07 


.18 


.04 




AR Items 








A2 


B2 


C2 


Correlations 








A1 


.73** 


.40* 


.18 


B1 


.60** 


.98** 


.30 


Cl 


.46** 


.15 


.74** 


RMSD 


.10 


.30 


.07 



RC + LR Items 



Correlations 


A2 


B2 


C2 


A1 


.97** 


-.07 


-.09 


B1 


-.07 


.99+** 


.14 


Cl 


-.03 


.10 


.96" 


RMSD 


.04 


.08 


.02 




* p < .05 

»»p < .01 
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Table 7 



Correlations and root mean square differences for HM and HT item 
parameters from form 0LSS3. 



All Items 



Correlations 


A2 


B2 


C2 


A1 


.94** 


.27** 


.06 


B1 


.26* 


.99** 


.15 


Cl 


.04 


.12 


.86" 


RMSD 


.09 


.15 


.05 



AR Items 



Correlations 


A2 


B2 


C2 


A1 


.91** 


.42* 


.03 


B1 


.53** 


.98** 


.33 


Cl 


.14 


.36 


. 77 *' 


RMSD 


.10 


.22 


.07 



RC + LR Items 



Correlations 


A2 


B2 


C2 


A1 


.97** 


.24* 


.03 


B1 


.26* 


.99+** 


.09 


Cl 


.03 


.03 


. 97 *' 


RMSD 


.06 


.08 


.02 



* p < .05 

»»p < .01 



The equivalence of the true score equating tables 
was also examined for the HM and HT item param- 
eter estimates. A conversion table was constructed 
for both the HT and HM calibrations separately for 
converting true scores for four LSAT administra- 
tions to true scores on a base form (0LSS2). The base 
form scale was defined by the simultaneous calibra- 
tion of all items (i.e,, the HT set), and conversions 



were plotted as the difference between the HT or 
HM calibrations for each form and the HT calibra- 
tion for the base form Qabeled HT2). In Figure 2, 
the results are presented for converting 0LSS3 
scores to the 0LSS2 scale. 




no 

C* Cm 



Figure 2* 

Score conversions for 0LSS1 




True score on Form 0LSS2 



HT^KTC HM3-HT2 



The HM conversion function to base is given by 
HM3-HT2, and the HT conversion to base is given 
by HT3-HT2. As can be seen, the conversion functions 
for the heterogeneous and homogeneous calibrations, 
though nonlinear, are highly similar: throughout the 
true score range on form 0LSS2, the two calibrations 
appear to result in true scores that differ at most by 
about one half point. Also in Figure 2, it can be seen 
that true scores for the HM and HT calibrations of the 
base form (HM2 and HT2) are also similar having a 
difference (HM2-HT2) close to zero throughout the 
true score range. 



Similar conversion functions are plotted in Figures 
3 and 4 for forms 0LSS5 and 0LSS6. These HM and 
HT conversions also resulted in small differences. 2 
In Figure 5, only the HM-HT differences are plotted 
for each form. These differences ranged from -0.6 to 
+0.3 point. The largest differences were observed at 
the low and high regions of the base form scale. 
Still, the effect of multidimensionality on true score 
conversions is less than 1 point, or one question 
on form 0LSS2. Therefore, violations of uni- 
dimensionality as evidenced by the factor analyses 
do not appear to have a substantial impact on true 
score conversion tables. 
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Figure 3. 

Score conversions for 0LSS5. 




HT5-HT2 HM5-HT2 



o 
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True score difference 



Figure 4. 

Score conversions for 0LSS6 




HT6-HT2 HM6-HT2 




o 

ERIC 
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True score difference 



Figure 5. 

Differences between true scores for homogeneous and heterogeneous 
item calibrations for four test forms. 




0LSS2 0LSS3 0LSS5 0LSS6 
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Section 9. Future Research 

The effect of multidimensionality on the true score 
conversion tables appears to be minimal. However, 
inherent in the method of Dorans and Kingston 
(1985) is the assumption that the quantity being 
equated is a composite of underlying abilities. This 
is because the LSAT raw score, being the sum of 
right/ wrong item responses, forces a unit weight- 
ing of items that may represent different underlying 
factors. An alternative scoring method based on a 3P 
IRT model would also extract a composite of the un- 
derlying multiple abilities, though an item's 
contribution to ability estimation would be weighted. 
We have found in this study that true score conver- 
sions for the composite score are not greatly affected 
by the presence of two substantially correlated factors. 

The effect of multidimensionality could be substan- 
tially different for methods of equating that do not 
use a composite ability. For example, suppose true 
score conversion tables were created for AR and 
RC+LR separately. The "total score" on the base 
scale could then be obtained as the sum of the two 
converted true scores. Now the question could be 
asked "What is the difference between the heteroge- 
neous test conversion and the sum of the two 
homogeneous test conversions?" An analysis of this 



Notes 

1. Often matrices of estimated tetrachorics are 
nonpositive definite which is problematic for 
statistical methods that require inversion of this 
matrix. In the present study, all tetrachoric matri- 
ces were positive definite, though the method of 
diagonally weighted least squares implemented 
in LISREL 7 does not require this condition. 



phenomenon is more complex than the analysis in 
this report for two reasons. First, there is not a sin- 
gle conversion function; rather, there is a converted 
true score on the base form for each pair of true 
scores on the new form. Thus, the equating func- 
tion is three dimensional requiring three 
dimensional versions of Figures 2-5. Second, if two 
subtests were defined for equating, each would be 
shorter than the full test resulting in diminished reli- 
ability. Also note that the AR subtests (with 29 
items) would have substantially less reliability than 
the RC+LR subtest (with 63-69 items). 

We have implied above that estimates of ability on 
the AR and RC+LR subtests may be substantially 
different for some examinees. This possibility is sug- 
gested by the moderate correlation (r * 0.7) between 
the two abilities. It seems important for two reasons 
to identify the examinees for whom this discrep- 
ancy is large, and to determine whether the 
discrepancy is a function of demographic or other 
background variables. First, systematic discrepan- 
cies in ability estimates for certain groups creates 
the potential for differential item functioning. Sec- 
ond, if both abilities are important for the 
prediction of academic performance, then some loss 
of predictive power might be associated with treat- 
ing the test as unidimensional. 



2. The amplitude of the S-shaped curves, i.e., the ver- 
tical distance of a single curve to the zero point 
on the Y-axis, is affected by the number of items 
on the tests being equated. For this reason, it is 
important to focus on the difference between the 
two equating curves for the purpose of assessing 
the effects of multidimensionality. 
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Appendix 1. 

Abridged LISREL Printout for the 3-factor Solution 



LISREL 7: 

Estimation of Linear Structural Equation Systems 
Program Version 7.16 

Distributed by: 

Scientific Software, Inc. 

1369 Neitzel Road 
Moores ville, Indiana 46158 
(317) 831-6336 

This copy authorized for use in SPSS-X. 

Program Copyright 1977-89 by Scientific Software, 
Inc., (a Michigan corporation). 

Distribution or use unauthorized by Scientific Soft- 
ware, Inc. is prohibited. 

MVS -LISREL 7.16 
By 

Karl G. Joreskog and Dag Sorbom 



The following LISREL control lines have been read: 

DA NI=94 NO=43317 MA=PM 
PM UNIT=18 
DM UNIT=17 

MO NX=94 NK=3 LX=FR TD=DI,FR PH=SY 
FI PH(1,1) PH(2,2) PH(3,3) 

VA 1.0 PH(1,1) PH(2,2) PH(3,3) 

PA LX 

29*0 0 0 ) 32*(0 1 0) 33*(0 01)., 

OU DWLS TM=1500 

September 1989. Three factor solution. 



Number of Input Variables 94 

Number of Y- Variables 0 

Number of X- Variables 94 

Number of ETA- Variables 0 

Number of KSI- Variables 3 



Number of Observations 43317 

Warning: Chi-square, standard errors, t-values and 
standardized residuals are calculated under the as- 
sumption of multi-variate normality. 



O 
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LISREL Estimates (Diagonally Weighted Least Squares) 
Lambda X 





KSI 1 


KSI 2 


KSI 3 




KSI1 


KSI 2 


KSI 3 


Var 1 


0.627 


0.000 


0.000 


Var 47 


0.000 


0.398 


0.000 


Var 2 


0.607 


0.000 


0.000 


Var 48 


0.000 


0.448 


0.000 


Var 3 


0.569 


0.000 


0.000 


Var 49 


0.000 


0.419 


0.000 


Var 4 


0.661 


0.000 


0.000 


Var 50 


0.000 


0.538 


0.000 


Var 5 


0.654 


0.000 


0.000 


Var 51 


0.000 


0.444 


0.000 


Var 6 


0.652 


0.000 


0.000 


Var 52 


0.000 


0.462 


0.000 




0.534 


0.000 


0.000 


Var 53 


0.000 


0.426 


0.000 


Var 8 


0.500 


0.000 


0.000 


Var 54 


0.000 


0.463 


0.000 


Var 9 


0.570 


0.000 


0.000 


Var 55 


0.000' 


0.463 


0.000 


Var 10 


0.522 


0.000 


0.000 


Var 56 


0.000 


0.595 


0.000 


Var 11 


0.556 


0.000 


0.000 


Var 57 


0.000 


0.464 


0.000 


Var 12 


0.543 


0.000 


0.000 


Var 58 


0.000 


0.489 


0.000 


Var 13 


0.598 


0.000 


0.000 


Var 59 


0.000 


0.624 


0.000 


Var 14 


0.633 


0.000 


0.000 


Var 60 


0.000 


0.510 


0.000 


Var 15 


0.488 


0.000 


0.000 


Var 61 


0.000 


0.599 


0.000 


Var 16 


0.625 


0.000 


0.000 


Var 62 


0.000 


0.000 


0.266 


Var 17 


0.645 


0.000 


0.000 


Var 63 


0.000 


0.000 


0.389 


Var 18 


0.631 


0.000 


0.000 


Var 64 


0.000 


0.000 


0.440 


Var 19 


0.592 


0.000 


0.000 


Var 65 


• 0.000 


0.000 


0.333 


Var 20 


0.814 


0.000 


0.000 


Var 66 


0.000 


0.000 


0.535 


Var 21 


0.462 


0.000 


0.000 


Var 67 


0.000 


0.000 


0.554 


Var 22 


0.506 


0.000 


0.000 


Var 68 


0.000 


0.000 


0.393 


Var 23 


0.771 


0.000 


0.000 


Var 69 


0.000 


0.000 


0.590 


Var 24 


0.519 


0.000 


0.000 


Var 70 


0.000 


0.000 


0.330 


Var 25 


0.599 


0.000 


0.000 


Var 71 


0.000 


0.000 


0.393 


Var 26 


0.599 


0.000 


0.000 


Var 72 


0.000 


0.000 


0.518 


Var 27 


0.579 


0.000 


0.000 


Var 73 


0.000 


0.000 


0.442 


Var 28 


0.665 


0.000 


0.000 


Var 74 


0.000 


0.000 


0.667 


Var 29 


0.644 


0.000 


0.000 


Var 75 


0.000 


0.000 


0.356 


Var 30 


0.000 


0.399 


0.000 


Var 76 


0.000 


0.000 


0.367 


Var 31 


0.000 


0.195 


0.000 


Var 77 


0.000 


0.000 


0.593 


Var 32 


0.000 


0.480 


0.000 


Var 78 


0.000 


0.000 


0.475 


Var 33 


0.000 


0.392 


0.000 


Var 79 


0.000 


0.000 


0.210 


Var 34 


0.000 


0.443 


0.000 


Var 80 


0.000 


0.000 


0.479 


Var 35 


0.000 


0.603 


0.000 


Var 81 


0.000 


0.000 


0.269 


Var 36 


0.000 


0.425 


0.000 


Var 82 


0.000 


0.000 


0.446 


Var 37 


0.000 


0.500 


0.000 


Var 83 


0.000 


0.000 


0.389 


Var 38 


0.000 


0.452 


0.000 


Var 84 


0.000 


0.000 


0.508 


Var 39 


0.000 


0.243 


0.000 


Var 85 


0.000 


0.000 


0.480 


Var 40 


0.000 


0.433 


0.000 


Var 86 


0.000 


0.000 


0.447 


Var 41 


0.000 


0.347 


0.000 


Var 87 


0.000 


0.000 


0.349 


Var 42 


0.000 


0.378 


0.000 


Var 88 


0.000 


0.000 


0.435 


Var 43 


0.000 


0.326 


0.000 


Var 89 


0.000 


0.000 


0.676 


Var 44 


0.000 


0.411 


0.000 


Var 90 


0.000 


0.000 


0.417 


Var 45 


0.000 


0.444 


0.000 


Var 91 


0.000 


0.000 


0.696 


Var 46 


0.000 


0.516 


0.000 


Var 92 


0.000 


0.000 


0.573 










Var 93 


0.000 


0.000 


0.518 










Var 94 


0.000 


0.000 


0.619 


Phi 








Total coefficient of determination for X- variables is 0.997. 




KSI 1 


KSI 2 


KSI 3 


Chi-square with 4274 degrees of freedom 


= 625285.97 (p = 0.000) 










Goodness of Fit Index = 0.965 




KSI1 


1.000 






Adjusted Goodness of Fit Index = 0.963 




KSI 2 


0.676 


1.000 




Root Mean Square Residual 


= 0.040 




KSI 3 


0.771 


0.893 


1.000 


















Summary Statistics for Fitted Residuals 





Smallest Fitted Residual = -0.197 
Median Fitted Residual = -0.002 
Largest Fitted Residual = 0.435 
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Appendix 2. 

Abridged LISREL Printout for the 11-factor Solution 



LISREL 7: 

Estimation of Linear Structural Equation Systems 
Program Version 7.16 

Distributed by: 

Scientific Software, Inc. 

1369 Neitzel Road 
Mooresville, Indiana 46158 
(317) 831-6336 

This copy authorized for use in SPSS-X. 

Program Copyright 1977-89 by Scientific Software, 
Inc., (a Michigan corporation). 

Distribution or use unauthorized by Scientific Soft- 
ware, Inc. is prohibited. 

MVS - L I S R E L7.16 
By 

Karl G. Joreskog and Dag Sorbom 



The following LISREL control lines have been read: 

DA NI=94 NO=43317 MA=PM 
PM UNIT=18 
DM UNIT=17 

MO NX=94 NK=11 LX=FR TD=DI,FRPH=ST 
PA LX 

6*(1 000000000 0 ) 

5*(0 100000000 0 ) 

6*(0 010000000 0 ) 

6*(0 001000000 0 ) 

6*(0 000100000 0 ) 

4*(0 000010000 0 ) 

8*(0 000001000 0 ) 

7*(0 000000100 0) 

7*(0 000000010 0) 

6*(0 0000000010 ) 

33*(0 00 00000001 ) 

OU DWLS TM=1500 

September 1989. Eleven factor solution. 

Number of Input Variables 94 

Number of Y- Variables 0 

Number of X- Variables 94 

Number of ETA- Variables 0 

Number of KSI- Variables 11 

Number of Observations 43317 

Warning: Chi-square, standard errors, t-values and 
standardized residuals are calculated under the as- 
sumption of multi-variate normality. 



ERIC 



31 



besht copy available 



LISREL Estimates (Diagonally Weighted Least Squares) 
Lambda X 





KSLL 


mi 


KSI 3 


KSI 4 


KSI 5 


Var 1 


0.767 


0.000 


0.000 


0.000 


0.000 


Var 2 


0.747 


0.000 


0.000 


0.000 


0.000 


Var 3 


0.701 


0.000 


0.000 


0.000 


0.000 


Var 4 


0.810 


0.000 


0.000 


0.000 


0.000 


Var 5 


0.798 


0.000 


0.000 


0.000 


0.000 


Var 6 


0.803 


0.000 


0.000 


0.000 


0.000 


Var 7 


0.000 


0.710 


0.000 


0.000 


0.000 


Var 8 


0.000 


0.667 


0.000 


0.000 


0.000 


Var 9 


0.000 


0.764 


0.000 


0.000 


0.000 


Var 10 


0.000 


0.704 


0.000 


0.000 


0.000 


Var 11 


0.000 


0.745 


0.000 


0.000 


0.000 


Var 12 


0.000 


0.000 


0.610 


0.000 


0.000 


Var 13 


0.000 


0.000 


0.669 


0.000 


0.000 


Var 14 


0.000 


0.000 


0.711 


0.000 


0.000 


Var 15 


0.000 


0.000 


0.545 


0.000 


0.000 


Var 16 


0.000 


0.000 


0.701 


0.000 


0.000 


Var 17 


0.000 


0.000 


0.724 


0.000 


0.000 


Var 18 


0.000 


0.000 


0.000 


0.728 


0.000 


Var 19 


0.000 


0.000 


0.000 


0.685 


0.000 


Var 20 


0.000 


0.000 


0.000 


0.925 


0.000 


Var 21 


0.000 


0.000 


0.000 


0.541 


0.000 


Var 22 


0.000 


0.000 


0.000 


0.586 


0.000 


Var 23 


0.000 


0.000 


0.000 


0.873 


0.000 


Var 24 


0.000 


0.000 


0.000 


0.000 


0.659 


Var 25 


0.000 


0.000 


0.000 


0.000 


0.752 


Var 26 


0.000 


0.000 


0.000 


0.000 


0.749 


Var 27 


0.000 


0.000 


0.000 


0.000 


0.723 


Var 28 


0.000 


0.000 


0.000 


0.000 


0.829 


Var 29 


0.000 


0.000 


0.000 


0.000 


0.802 


Var 30 


0.000 


0.000 


0.000 


0.000 


0.000 


Var 31 


0.000 


0.000 


0.000 


0.000 


0.000 


Var 32 


0.000 


0.000 


0.000 


0.000 


0.000 


Var 33 


0.000 


0.000 


0.000 


0.000 


0.000 


Var 34 


0.000 


0.000 


0.000 


0.000 


0.000 


Var 35 


0.000 


0.000 


0.000 


0.000 


0.000 


Var 36 


0.000 


0.000 


0.000 


0.000 


0.000 


Var 37 


0.000 


0.000 


0.000 


0.000 


0.000 


Var 38 


0.000 


0.000 


0.000 


0.000 


0.000 


Var 39 


0.000 


0.000 


0.000 


0.000 


0.000 


Var 40 


0.000 


0.000 


0.000 


0.000 


0.000 


Var 41 


0.000 


0.000 


0.000 


0.000 


0.000 


Var 42 


0.000 


0.000 


0.000 


0.000 


0.000 


Var 43 


0.000 


0.000 


0.000 


0.000 


0.000 


Var 44 


0.000 


0.000 


0.000 


0.000 


0.000 


Var 45 


0.000 


0.000 


0.000 


0.000 


0.000 


Var 46 


0.000 


0.000 


0.000 


0.000 


0.000 


Var 47 


0.000 


0,000 


0.000 


0.000 


0.000 


Var 48 


0.000 


0.000 


0.000 


0.000 


0.000 


Var 49 


0.000 


0.000 


0.000 


0.000 


0.000 


Var 50 


0.000 


0.000 


0.000 


0.000 


0.000 


Var 51 


0.000 


0.000 


0.000 


0.000 


0.000 


Var 52 


0.000 


0.000 


0.000 


0.000 


0.000 


Var 53 


0.000 


0.000 


0.000 


0.000 


0.000 
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Total coefficient of determination for X- variables is 1.000. 



Chi-square with 4222 degrees of freedom = 228892.07 (p = 0.000) 
Goodness of Fit Index = 0.987 
Adjusted Goodness of Fit Index = 0.986 
Root Mean Square Residual = 0.027 

Summary Statistics for Fitted Residuals 
Smallest Fitted Residual = -0.255 
Median Fitted Residual = 0.000 
Largest Fitted Residual = 0.248 
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